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We study the set of solutions of random fc-satisfiability formulae through the cavity method. It 
, is known that, for an interval of the clause-to- variables ratio, this decomposes into an exponential 

. number of pure states (clusters) . We refine substantially this picture by: (i) determining the precise 

location of the clustering transition; (ii) uncovering a second 'condensation' phase transition in the 
structure of the solution set for k > 4. These results both follow from computing the large deviation 
C . rate of the internal entropy of pure states. From a technical point of view our main contributions 

C ' are a simplified version of the cavity formalism for special values of the Parisi replica symmetry 

^ I breaking parameter m (in particular for m = 1 via a correspondence with the tree reconstruction 

problem) and new large-fc expansions. 



I. INTRODUCTION 



'"O ' An instance of fc-satisfiability (fc-SAT) consists in a Boolean formula in conjunctive normal form whereby each 
Ph I elementary clause is the disjunction of k literals (a Boolean variable or its negation). Solving it amounts to determining 
O , whether there exists an assignment of the variables such that at least one literal in each clause evaluates to true. The 
fc-SAT problem plays a central role in the theory of computational complexity, being the first decision problem 
proven to be NP-complete [l[ (for all fc > 3). Its optimization (minimize the number of unsatisfied clauses) and 
■ enumeration (count the number of optimal assignments) versions arc defined straightforwardly and are also hard from 
, the computational point of view. 

Random fc-satisfiability is the ensemble defined by drawing a uniformly random formula among all the ones involving 
\^ , M fc-clauses over N variables. Equivalently, each of the M clauses is drawn uniformly over the 2'^'(^) possible ones, 
• independently from the others. It was observed empirically earlier on 0] that, by tuning the clause density a = M/N , 
' this ensemble could produce formulae which were hard for known algorithms. Hardness was argued to be related to 
a sharp threshold in the satisfiability probability, emerging as — > oo with a fixed. More precisely, it is believed 
that there exists a constant as(fc) such that random formulae are with high probability^ satisfiable if a < as{k) and 
, unsatisfiable if a > Q!s(fc). The existence of a sharp threshold was proven in_l3||, with, however, a critical point as{k, N) 
^ ' which might not converge when N ^ oo. Despite important progresses [JlBj @] the rigorous proof of the existence 
I and determination of as(fc) remains a major open problem (with the notable exception of fc = 2 [3]). 
' The connection between threshold phenomena and phase transitions spurred a considerable amount of work [1, 
H I 01 HIIj using techniques from the theory of mean field spin glasses [T^j- The main outcomes of this approach have 
been: (i) A precise conjecture on the location of the satisfiability threshold as(fc) [13, [l^; The suggestion (ol. flol| 
for fc > 3 of another transition at ad(fc) < Qfs(fc) affecting the geometry of the solutions space; (iii) Most strikingly, 
the proposal of a new and extremely effective message passing algorithm. Survey Propagation (SP) [13, [HI- This 
exploits a detailed statistical picture of the solution space to efficiently find solutions. 

According to statistical physics studies, in the intermediate regime a G [a^ik) , as{k)] solutions tend to group 
themselves in clusters that are somehow disconnected. As a increases, the number of these clusters decreases. The 
satisfiability transition is thus due to the vanishing of the number of clusters, which still contain a large number 
of solutions just before Q!s(fc)- The phase transition at ad(fc) has been referred to as "clustering phase transition" 



^ Here and below 'with high probability' (w.h.p.) means with probability converging to 1 as — > oo. 
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or "dynamic phase transition" depending on the feature emphasized. Its nature and location, as well as a refined 
description of the regime a £ [a^{k), as(fc)] will be the main topic of this paper. More precisely: 

{i) We will argue that previous determinations of Q!d(fc) [lHj have to be corrected when fluctuations of the 
cluster sizes are taken into account; 

{a) We will uncover (for fc > 4) a new 'condensation' phase transition at ac(fc) £ [ad{k),as{k)]. For a £ 
[ad(fc), Q!c(fc)] the relevant clusters are exponentially numerous. For a G [ac{k),as{k)\ most of the solutions 
are contained in a number of clusters that remains bounded as ^ oo. 

The paper is organized as follows. In Section [TT] we recall some general features of mean-field disordered models, 
emphasizing the notions of dynamical transitions and replica symmetry breaking. In Section IIIII we define more 
precisely the ensemble of random formulas studied and describe the replica symmetric (RS) and one step of replica 
symmetry breaking (IRSB) approach to this model. We then apply the program of Section [TT] to the random k- 
satisfiability problem and present our main results in Section IIVI For the sake of clarity some technicalities of the 
IRSB treatment are presented shortly afterward, see Section |Vl To complement these results, which are partly based 
on a numerical resolution of integral equations, we present in Section fVll an asymptotic expansion in the large k limit 
which gives further credit to our theses. We draw our conclusions in Sec. IVIII Technical details are deferred to three 
appendices. 

A short account of our results has been published in [13] , and a detailed analysis of the related g-coloring problem 



of 



in 15 1 . While the present work was being finished two very interesting papers confirmed t he g enerality of the results 



The first concerned 3-SAT [l^| and the second bi-coloring of random hypergraphs [l7|. 



II. MEAN-FIELD DISORDERED SYSTEMS 

The goal of this section is to provide a quick overview of the cavity method [H, . We will further propose a 
more precise mathematical formulation of several notions that are crucial in the statistical physics approach. 



A. Statistical mechanics and graphical models 



Let us start by considering a general model defined by: 

(1) A factor graph [l^, i.e. a bipartite graph G = {V,F,E). Here V, \V\ = N, are 'variable nodes' corresponding 
to variables, F, \F\ — M, are 'function (or factor) nodes' describing interactions among these variables, and E are 
edges between variables and factors. Given i G V (resp. a £ F), we shall denote hy di — {a G F : (ia) € E} (resp. 
da = {i G V : (ia) £ E}) its neighborhood. Further, given i,j e V, we let d{i,j) be their graph theoretic distance 
(the minimal number of factor nodes encountered on a path between i and j). 

(2) A space of configurations , with X a finite alphabet, (a configuration will be denoted in the following as 
g_ — (fJi, . . . , un) G X"^). For any set A C ]/, we let ct^ = {di : i e A}. 

(3) A set of non negative weights {wa ■ a e F}, Wa ■ X^"^ E+, g_g^ i-^ Wa{g_Qa)- In the case of constraint 
satisfaction problems^ these are often taken to be indicator functions (more details on this particular case will be given 
in Sec. Ed]). 

Given these ingredients, a measure over X^ is defined as 

Miv(2:) = ^ Wn{s.) , Wn{s.) = TT Waigig^) ■ (1) 

This is well defined only if there exists at least one configuration a* that makes all the weights strictly positive, 
namely Wa{s.g^) > for each a. We will assume this to be the case throughout the paper (i.e. we focus on the 
'satisfiable' phase). Further, it will be understood that we consider sequences of graphs (and weights) of diverging 
size (although we shall often drop the subscript N). 

An important role is played by the large- iV behavior of the partition function Z^. This is described by the 
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free-entropy density^ 



^ J™ Tplog^JV , Zn = VwatIct) . (2) 

N^OG I\ ^ — ^ 



B. Pure states and replica symmetry breaking 

The replica/cavity method allows to compute a hierarchy of approximations to 0. This is thought to yield the 
exact value of </> itself in 'mean field' models. The hierarchy is ordered according to the so-called number of steps of 
replica symmetry breaking (RSB). At each level the calculation is based on some hypotheses on the typical structure 
of /i, a pivotal role being played by the notion of pure state. Since this concept is only intuitively defined in the 
physics literature, we propose here two mathematically precise definitions. In both cases a pure state is a (sequence 
of) probability measures pN on . 

• Definition of pure states through correlation decay 
We define the correlation function of pN as 

CAr(r) = sup V \pN{g.A,^B) - PN{g_A)PN{g.B)\ , (3) 

where the sup is taken over all subset of variable nodes A,BCV such that the distance between any pair of 
nodes («, j) € A x B is greater than r. Then pN is a pure state if this correlation function decays at large r. 
Technically, we let Coo{r) — limsupjv^oo ^Nii"), and require Coo{r) — > as r — > oo. 

• Definition of pure states through conductance 
We let the (e, (5)-conductance of pN be 

J^(e,<5)= inf \—I^i^i^^l—--.5<pN{A)<l-5\. (4) 



^cA-" \ pn{A){1 - pn{A)) 

Here the inf is taken over all subsets of the configuration space. Further, letting D denote the Hamming distance 
in , we defined the boundary of A as d^A = {g_€ X^ \ A \ D{g_,A) < Ne}. With these definitions p^ is 
pure if its conductance is bounded below by an inverse polynomial in N for all e and S (while non-pure states 
have a conductance which typically decays exponentially with N). 

These two definitions mimic the well-known ones on Z'' in terms of tail triviality and extremality 20]. Further, 
the second one is clearly related to the behavior of local Monte Carlo Markov chain dynamics. A small conductance 
amounts to a bottleneck in the distribution and hence to a large relaxation time. While we expect them to be 
equivalent for a large family of models, proving this is a largely open problem. Moreover we should emphasize that 
the heuristic cavity method followed in this paper never explicitly uses either of these definitions. 

The hypotheses implicit in the cavity method can be expressed in terms of the pure states decomposition of p. This 
is a partition of the configuration space (dependent on the graph and weights) such that the measure p constrained 
to each element of this partition is a pure state. More precisely, let us call {-4-y}-y a partition of X^ , and define 

Z 1 

Z^= w{a) , ^7 = ^ > ^7(2) = -^w{a)I{a G A^) . (5) 

Clearly p can be written as the convex combination of the p^ with coefficients W^. This defines a pure state 
decomposition if: (i) each of the p-^ is a pure state in the sense given above, (m) this is the 'finest' such partition, in 
the sense that the p^ are no longer pure if any subset of them is replaced by their union. 

Statistical physics calculations suggest that a wide class of mean field models is described by one of the following 
'universal behaviors'. The terminology used here is inherited from the literature on mean field spin glasses pll. [23|. 



One usually assumes that the limit exists. If the model is disordered, almost sure limit can be used, or, equivalently, log Zjv is replaced 
by its expectation. 
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RS Most of the measure is contained in a single element of the partition, namely Wmax — max-y ^ 1 as A'' ^ c» 
{replica symmetric). 

dlRSB Most of the measure is carried by A/" = e^^* pure states^, each one with a weight W-y = e^^^* {dynamical 
one-step replica symmetry breaking). 

IRSB The measure condensates on a subexponential number of pure states, namely, if W[^] is the weight of the 7-th 
largest state, then lim„^oo linijv^oo Y^^=i ^[7] ~ 1 {one step replica symmetry breaking). 

The reader will notice that this list does not include full replica symmetry breaking phases, in which pure states 
are organized according to an ultrametric structure. While this behavior is as generic as the previous ones, our 
understanding of it in sparse graph models is still rather poor. 

We are mostly concerned with families of models of the type defined in Eq. ^ indexed by a continuous parameter 
a (such as the clause density in fc-SAT). In this setting, the above behaviors often appear in sequence as listed above 
when the system becomes more and more constrained (e.g. as a is increased in fc-SAT). The different regimes are 
then separated by phase transitions: the 'dynamical' or 'clustering' phase transition from RS to dlRSB (at ad) and 
the 'condensation' phase transition between dlRSB and IRSB (at ac). The paradigmatic example of such transitions 
is the fully-connected p-spin model j2ll. [2^. where they are encountered upon lowering the temperature. 

Let us stress that the above definitions are insensitive to what happens in a fraction of the space of configurations 
of vanishing measure. For instance, we neglect metastable states whose overall weight is exponentially small^. 

A convenient tool for distinguishing these various behaviors is the replicated free-entropy [23l |. 

where m is an arbitrary real number (known as Parisi replica symmetry breaking parameter) which allows to weight 
differently the various pure states according to their sizes. Suppose indeed that the number of pure states 7 with 
internal free-entropy density (j).y = {log Z~^)/N behave at leading order as exp{A^I](0^)}, where S(0) is known as the 
complexity (or configurational entropy) of the states. The sum in © can then be computed by the Laplace method; 
if one assumes for simplicity that E is positive on an interval 0+], this leads to 

$(m) = sup [Y.{(f)) + m(f)] . (7) 

Provided S is concave, it can be reconstructed in a parametric way from $(m) by a Legendre inversion [23l |. 

T.{(f>int{m)) = $(to) - m$'(m) , (j)int{m) = $'(to) , (8) 

where m is such that the supremum in ([7]) lies in the interior of which defines a range [m_,m+]. Usually 

S vanishes continuously at 0+. As explained below, when zero energy states are concerned 4>int{fn) coincides with 
the internal entropy of such states. Note that a given value of m selects the point of the curve T.{4>) of slope — m; in 
particular the value m = corresponds to the maximum of the curve. 

The replica/cavity method at the level of one step of replica symmetry breaking allows to compute the replicated 
free-entropy $(m) under an appropriate hypothesis on the organization of pure states. The various regimes can be 
distinguished through the behavior of this function, namely 

RS <i>(m) — mcj)^,, where 0, is the contribution of the single dominant pure state, Z^i] = e^'^* . 

dlRSB $(m)/m achieves its minimum for m S [0,1] at m = 1, with S, ~ *&(!) ~ ^'(1) > 0. Then the measure /i 
decomposes into approximately e^^* pure states of internal free-entropy $'(1). 

IRSB ^{m)/m achieves its minimum over the interval [0,1] at nis G (0,1). Then the ordered sequence of weights 
> W\2] > W[3] > • • • keep fluctuating in the thermodynamic limit, and converges to a Poisson-Dirichlet 
process [24] of parameter mg. The internal free-entropy of these states is ^'{rus). 

In all these cases the total free-entropy density is estimated by minimizing ^{m)/m in the interval [0, 1]. 



^ Here and in the following = means equality at the leading exponential order. 

* In the fully connected models such metastable states are indeed seen as solutions of the Thouless- Anderson-Palmer equations, well above 
the dynamical phase transition. 
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C. Cavity equations 

We shall now recall the fundamental equations used within the IRSB cavity method and propose a somehow original 
derivation. In the following we will be interested in factor graphs that converge locally^ to trees in the thermodynamic 
limit. 

In consequence, let us first consider the case of a model of type ^ whose underlying factor graph is a tree, and 
discuss later how the long loops are taken into account by the cavity method. Tree factor graph models are easily 
solved by a 'message passing' procedure jl^. One associates to each directed edge from factor a to variable i (resp. 
from i to a) a "message" rja^i (resp. rji^a)- Messages are probability measures on X. On trees, they can be defined 
as the marginal law of cr^ with respect to the modified factor graph Ga~ti (resp. Gi^a) where all factor nodes in 
di\a (resp. the factor node a) have been removed. Simple computations yield the following local equations between 
messages, 

Va~*i = fa^i{{llj^a}jeda\i) i 
Vi^a = fi^ai{r]b-ri}bedi\a) i 

where the functions z are fixed by the normalization of the 77's. As we consider a tree factor graph these equations 
have a unique solution, easily determined in a single sweep of updates from the leaves of the graph towards its inside. 
Moreover the free entropy of the model follows from this solution and reads 

N(l) = logZ = - ^ log Z^a{Va^i,Vi^a) + ^ log Zq ({?7i^Q jieOa ) + ^log Z^{{r]a^t} aedi) ■ (H) 
(i,a) a i 

Here the first sum runs over the undirected edges of the factor graph and the z's are given by 

This computation is correct only on tree factor graphs. Nevertheless it is expected to yield good estimates of the 
marginals and free entropy for a number of models on locally tree-like graphs. The belief propagation (BP) algorithm 
consists in iterating Eqs. (|9ll0p in order to find an (approximate) fixed point. In particular, whenever the RS scenario 
holds, there should be one approximate solution of the above equations that yields the correct leading order of the free 
entropy density in the thermodynamic limit. In any case, when dealing with random factor graphs, one can always 
turn this simple computation into a probabilistic one, defining a distribution of random messages by reading (|9ll0p 
in a distributional sense with random weight functions and variables' degrees. The RS estimate of the average free 
entropy is then obtained by averaging the various terms in pip with respect to these random messages. 

This approach can be refined in dlRSB and IRSB regimes. The BP equations (|9|10p should be approximately valid 
if one computes the messages rja^i and rji^a as marginal laws of the measure /i^ restricted to a single pure state 7. 
When the number of pure states is very large, one considers a distribution (with respect to the pure states 7 with 
their weights W^) of messages on each directed edge of the factor graph. 

A simple and suggestive derivation of the IRSB equations goes as follows. Assume that the factor graph is a tree, 
and choose a subset B of the variable nodes that will act as a boundary, for instance (but not necessarily) the leaves of 
the factor graph. Each configuration g_Q of the variables in B induces a conditional distribution /i— ^ on the remaining 
variables, 

(X) = -^w{r)l{TB - ^b) . (13) 

where here and in the following I denotes the indicator function of an event, and the normalizing factor Z-b is the 
partition function restricted to the configurations coinciding with ct^ on the boundary. 



Ia^i{{ri]^a}){(Ti) 



Ji-,a{{rib-,i}){(Ti) 



(10) 



b^di\a 



More precisely, any finite neighborhood of a uniformly chosen random vertex converges to a tree. 



6 



Since the factor graph corresponding to ^l—b is still a tree, the corresponding marginals and partition function Z-b 
can be computed iterating the message passing equations (|9ll0p . with an appropriate prescription for the messages 
rji^a emerging from variables i £ B, namely rji^aiu) — ^ai,Ti- Let us denote by "q^H^i and r]~^^ the corresponding set 
of messages, solutions of (|9I10|) on all edges of the factor graph. Further define, for to M, a probability measure on 
the boundary conditions as 

A^b) = ^^7(^ ■ (14) 

The idea is to mimic the pure states of a large, loopy factor graph model, by the boundary configurations of a tree 
model. Calling Pa^i (resp. Pi^a) the distribution of the messages Jyfvf^j (resp. rjfj^^) with respect to // ^, a short 
reasoning reveals that 

Pa^.{l) = . . f n dP,-a(%-^a)<5(7?-/a^,({r?,^J)) , (15) 

where the functions / and z are defined in Eq. ([9]), (jlO|) . and the Z[- ■ •] are normalizing factors determined by the 
condition J dPa^iii]) — J dPi^aiv) = 1- Equations p^ . coincide with the standard IRSB equations with Parisi 
parameter to In addition the free entropy density associated to the law /i, N^{m) = logj^^. (Z—b)™-} can be 
shown to be 

iV$(TO)=- \0gZ,a[Pa^^,P^^a,m] + Y\0gZa[{P^^a}^e^a,r1A+Y^0gZ,[{Pa^i\ae^^,rn] , (17) 

{i,a)eE aeF ieV 



where the factors Z... are fractional moments of the ones z... defined in Eq. p2p . namely 

Z^a = JdPa-,^{Va-,^)dP^-,a{V^^a) zTa , Za = J dP,^a{V^->a) C = / H dPa^,(??a^<) • (18) 

As in the RS case, one can heuristically apply (|15|16p on any graph, even if it is not a tree. Of particular interest 
is the limit i? — > 0. Equations (fT5|) . (fT6|) may have two behaviors in this limit: (i) All the distributions Pi^a, Pa~>i 
become Dirac deltas in this limit. In this case a 'far away' boundary has small influence on the system, and it is 
easily seen by comparing (fTTj) and (|17p that $(m) = to0. (m) These distributions remain non-trivial in the limit 
B —f This case is interpreted as a consequence of the existence of many pure states. In this situation, even a small 
boundary influences the system by selecting one of such states. We thus interpret the B — % limit of <E>(to) as an 
estimate of the replicated potential ©. 

In Sec. Ill Bl we emphasized the special role played by the value m = 1: the dynamical transition is signaled by 
the appearance of a non-trivial solution of the IRSB equations with to = 1. This is particularly clear in the present 
derivation of the IRSB equations. Indeed, the distribution jl of the boundary condition coincides in this case with 
the Boltzmann distribution /i. 

The existence of a non-trivial solution of the IRSB equations at to = 1 is thus related to a peculiar form of long 
range correlations under u, as first pointed out in [26l |. Such correlations can be measured through a point-to-set 
correlation function [13, [H, . For concreteness let us give an expression of this correlation in the case of Ising 
spins. Given a variable node i and a set of variable nodes -B, we let 

C{i,B) = Y,K^B){^^^{^^\^B)^^ -(^K'^^)<^^ ■ (19) 

The reader will recognize the analogy between this expression and the difference qi — qo of intra and inter-state 
overlaps [s^l- The Boltzmann measure has lon g ra nge point-to-set correlations if C{i,B) does not decay to when 
d{i, B) grows. Such correlations were shown in |3ll . |32| | to imply a diverging relaxation time. 



more precisely, with respect to the measure fia^i (resp. fii^a) defined similarly for the factor graph Ga—^i (resp. Gi—^a)- 
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D. Application to constraint satisfaction problems 

This short overview of the cavity method did not rely on any hypothesis on the form of the weight factors Wa in 
Eq. ([!]). We now comment briefly on the way this general formalism is applied to constraint satisfaction problems 
(CSP), in order to clarify the relationship of the present work with previous studies. In a CSP the factors a correspond 
to constraints, which can be either satisfied or not by the configuration of their adjacent variables, ct^^. For a satisfiable 
instance of a CSP one can take Wa to be the indicator function of the event 'constraint a is satisfied.' Then the law 
defined in ([l]) is the uniform distribution over the solutions of the CSP, the partition function counts the number of 
such solutions and the free entropy reduces to the logarithm of the number of solutions. This "entropic" method [s^ 
is the most adequate to the study of the satisfiable phase. 

This approach is however ill-defined for unsatisfiable instances. The usual way to handle this case is to define a cost 
function E{a) on the space of configurations, equal to the number of unsatisfied constraints under the assignment 
a. Following the traditional notations of statistical mechanics one introduces an inverse temperature (3 and weighs 
the configurations with w{a) — eKp[—(3E{a)]. Small temperatures (large /?) favor low-energy configurations, in the 
limit — > cxD the measure fj, concentrates on the optimal configurations which maximizes the number of satisfied 
constraints. Let us detail this approach which was originally followed in [H, HI]. At the IRSB level the pure 
states are characterized by their energy density e and their entropy density s, with the free entropy density given by 
<f> = s — [3e. Defining the complexity I](s,e) according to the number of pure states with these two characteristics, 
Eq. ^ becomes 

to) = sup[S(s, e) + m{s - f3e)] . (20) 

s,e 

If one takes now the limit /3 — s- cx) and assume e > 0, the entropic term becomes irrelevant; to obtain a finite result 
one has to take at the same time m — > such that the product /3to, usually denoted y, remains finite. One thus 
obtains 

$c(y) = sup[So(e) - ye] , So(e) = sup E(s, e) . (21) 

e s 

In the unsatisfiable phase, the 'energetic' cavity approach allows to characterize the minimal energy of the problem. 

In the case of satisfiable problems, one has to perform a second limit y — > oo (after (3 — s- oo) to concentrate on 
the pure states with e = 0. It follows that the complexity thus computed is sup^I](s,e = 0), i.e. the maximum of 
the entropic complexity. In other words the procedure y oo after /? ^ oo is equivalent to perform the entropic 
computation with a Parisi parameter to = 0, i.e. to weigh all the pure states in a same way, irrespectively of their sizes. 
This is not a problem for the determination of the satisfiability threshold as, which corresponds to the disappearing 
of all zero-energy pure states, hence to the vanishing of the maximal complexity S(to = 0). However the value of ad 
in ^IQ. corresponds to the appearance of a solution of the IRSB equations with m — 0, and not with to = 1 which 
we argued to be the relevant value for the definition of ad. 

In the rest of the paper we shall follow the entropic cavity method, i.e. we take ((T)) to be the uniform measure 
over the solutions of the CSP under study and keep a finite value for the Parisi parameter to. Before entering the 
details of this approach on the example of random fc-satisfiability, let us mention that the existence of exponentially 
numerous pure states (called clusters in this context) for some values of a and k, has been proved in (ssl . [36| . An 
intrinsic limitation of these works was that clusters were defined by much stricter conditions than the one exposed 
above (which thus implied limitations on a, k). The consequences of the existence of a distribution of cluster's sizes 
have also been investigated in a toy model in |37l] . 

We should also emphasize that for the simpler CSP known as XORSAT [H, [s^ , a precise characterization of the 
clusters has been achieved through rigorous methods. A good part of the phenomena studied in the present paper is 
however absent of this simpler model. In particular all clusters of XORSAT have the same size because of the linear 
structure of the constraints. 



III. THE CAVITY METHOD APPLIED TO THE RANDOM fe-SAT PROBLEM 

A. Some definitions 

In the application of the formalism to fc-satisfiability, we use ai G X = {—1, -1-1} to encode the Boolean variables. 
A constraint a on k variables g_Q^ is satisfied by all the 2'^ configurations except one, let us call it J° = { J° : i G da}, 
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P 

j b / 




FIG. 1: An example of the factor graph representation of a satisfiabihty formula for fc = 3. The values Jf are encoded by 
drawing a solid (resp. dashed) edge between clause a and variable i if at — +1 (resp. —1) satisfies clause a. The distances 
between some of the variable nodes are dij = j/ = dij" — 1 and djji = 2. The neighborhoods are for instance di — {a, b, c}, 
da = {i,j,j"}, d+i = {a}, d-i = {b,c}, d+i{a) = 0, d-i{a) = {b,c}, d+i{b) = {c}, d-i{b) = {a}. 



in which all the literals of the clause are false. The weight factors are thus defined as Wa{o_g^) ~ ^{n.da 7^ iZ.°)) the 
indicator function of the event "clause a is satisfied." 

A formula is represented as a factor graph (cf . Fig. [ij whose edges are labeled by Jf . This suggests to refine the 
definition of the neighborhoods. Given a variable node i, d+i (resp. d-i) will denote the set of clauses which are 
satisfied by ai = +1 (resp. <Ti = —1). Further, given a clause a G di we call d+i{a) (resp. d-i{a)) the set of clauses in 
di\a which are satisfied by the same (resp. opposite) value of ai as is a. 

For fc-SAT formulas the general RS cavity equations (fTU]) can be written in a pretty explicit form. As the 
variables take only two values the cavity probability messages rja^i and iji^a can be parametrized by a single real 
number, that we shall call respectively Ua^i and hi^a and define by 

1 - JffTjtanhwa^j 1 - JfcTitanh/ii^a 

rla^^icr^) = , V^~^aiT^) = . (22) 

With these conventions Eqs. (O, (fTHl) take the form 

Ua^', = f{{hj^a}jeda\i) , f {hi hk-i) = \og i 1 - Yl ^ (23) 



\ i=l / 

ht^a = ^ Ub^i - ^ . (24) 

We are interested in the regime where the number M of uniformly chosen clauses and the number of variables N 
both diverge at fixed ratio a — M/N. The random factor graphs thus generated enjoy properties reminiscent of the 
Erdos-Renyi random graphs G{N,M) (40l. l4l|. In particular, for a uniformly random variable node i, the number of 
clauses in d+i and d-i converges to two i.i.d Poisson random variables of mean oikjl. The same statement is true for 
d+i(a) and d-i(a) when (i, a) is an uniformly chosen edge of the factor graph. The degree distribution is a very local 
description of a graph, looking at one node or edge only. It is however easy to show that any bounded neighborhood 
of a uniformly random node i converges to a random (Galton- Watson) tree with the same degree distribution [4l[ . 



B. The RS description of the random formulae ensemble 



The replica-symmetric treatment of the random /c-SAT problem was first worked out using the replica formalism 
in Q. In the cavity formulation one interprets the BP equations (|9llOI23l24p in a probabilistic way. More precisely, 
we introduce the distributions of Ua^i , hi—>a (over the choice of the random formula) and denote them as 7^(0) {h) and 
Q(o)(u). These distributions satisfy the distributional equations: 



u = f{hi,...,hk-i) , h = ^u+-^u~. (25) 

i=l i=l 
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In these expressions /i, {hi} (resp. u, {uf}) are independent copies of the random variable of distribution 'P(o)(^) (resp. 
2(0) the function / is defined in Eq. (j23p and l± are two independent Poisson random variables of mean ak/2. 

The symbol = denotes identity in distribution^. 
The RS prediction for the entropy reads 

'/'(o) = -akE logzi(u, h) + aE logZ2{hi, . . . ,hk) + E logZ3(u+, . . . , u^, uj^, . . . , U;" ) , (26) 

where the expectations are over i.i.d. copies of the random variables u and h, and l± are as above. The various 
entropy shifts are obtained by rewriting the z's in Eq. in terms of u and h, 



zi{u, h) = 1 + tanh/itanhu , (27) 

/, , X , tanh/ii 
Z2{hi,...,hk) = l-l[ (28) 



2 

/ 1 /_ 



Z3{ut,...,ul^,u^,...,Ui;') = J|(l + tanhii+)]J(l - tanhu^ ) + ]J(1 - tanhu+) J|(l + tanhu, ) . (29) 

2—1 i—1 i—1 i—1 

Similarly the RS overlap can be computed as 

qo = E[tanh^ h] . (30) 



Several equivalent expressions of the RS entropy can be found in the literature; the choice we made in (j26p has the 
advantage of being variational. By this we mean that the stationarity conditions of the function (j)^Q)[V, Q, a] with 
respect to V and Q are nothing but the self-consistency equations (|25p . Note also that the rigorous results of [i^. [isj 
imply that^ the entropy density cf) is upper-bounded by the RS 0(o) for any trial distribution V, as long as Q is linked 
to V by the first equation in ([25|) . for a regularized version of the model at finite temperature. Moreover the RS 
description was proven to be valid for small values of a in [31 ■ 

The numerical resolution of the equation on the order parameter is relatively easy. The distributions 7^(o) ^-nd Q{o) 
can indeed be represented by samples (or populations) of a large number Af of representatives, jfeil^i and {ui}iLi- 
The fixed point condition stated in ((25|) is looked for by an iterative population dynamics algorithm |25l . l4ll . l45l | . 

We turn now to the cavity formalism at the IRSB level, which assumes the organization of pure states described 
in Section HlBl 



C. The IRSB description of the random formulae ensemble 

As in the RS case, when the underlying formula is random, the messages Pi^a, Pa~>i along a uniformly random 
edge become random variables, whose distributions are denoted as V(i)[P], Q(i)[Q]- These distributions satisfy a 
couple of distributional equations, that are the probabilistic version of Eqs. (|15|16p . 



A: — 1 

Q(.) - „ ^ p 1 fYldP,ih,)5i>-fih,,...,hk-i))ziih,,...,hk^,r\ (31) 



»3({»+}!:,.{«r};;,r. (32) 



More explicitly, given two random variables X and Y we write X = Y if the distributions of X and Y coincide. For instance, if X, Xi , X2 

are iid standard normal random variables, X = (Xi + X2)/V2 
* In [43 . 143I this claim is made for k even. However the proof holds verbatim for k odd as well. To the best of our knowledge, this was 
observed first by Elitza Maneva in 2005. 
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where the P's (resp. Q's) are i.i.d. from ■P(i) (resp. Qm) and l± have the above stated Poissonian distribution. The 
entropy shift Z3 used in Eq. (|32p was defined in Eq. (j29|l . while Z4 is given by 



(33) 



Finally, the IRSB potential is obtained by taking the expectation of Eq. pT|) . One gets 

$(m) = -afcE log Zi [Q, P] + aE log Z2 [Pi , . . . , Pfe] + E log Z3 [Q+ , . . . , Q+ , : ■ • • : ] - (34) 
where the factors Zi are weighted averages of the corresponding entropy shifts, 

2i[Q,P] = j dP{h)dQ{u) zi{u,hr , (35) 



Z2[Pl,...,Pfe] = f l[dP,{h,) Z2{hi,...,hkr , 

1=1 

Z,[Q+,...,Ql^,Q^,...,Q-_]= m dQ+ iu+ ) J] dQr (z^r ) ^3 (^i+ , . . . , , , ■ • ■ , «r_ )" • 

, — 1 , — 1 



i=l i=l 

The inter and intra-state overlaps are given, respectively, by 

qo — itL \ I dP{h) tanh h] , gi = E 



dP{h) tanh^ h 



(36) 



(37) 



(38) 



The variational property discussed at the RS level still applies to the IRSB potential. This is of particular interest 
for the computation of the internal entropy of the states, given by a derivative with respect to m. This derivation can 
be applied to the explicit dependence only, and yields 



4>intim) 



JdP{h)dQ{u) zi{u,h)"'\ogZi{u,h) 



+ aE 



E 



Zi{Q,P) 

I Illl'im^) ^2i{h..}tir l0g^2({/^Jtl) 

/n!:idQ+(<)n!=idgr("r)^3({<} 'ti,{^r}'=iriog^3(K}!ti,{"r}!=i) 

1+1'+ 



(39) 



The rigorous results of |42. l43l| also imply (j) < ^{m)/m for any value of m in (0, 1), and any trial order parameter V 
(with Q defined by Eq. 

The numerical resolution of the IRSB equations pil32p is in general much harder than the one of their RS 
counterparts (compare with Eq. (|25p ). The population dynamics algorithm represents P(i) by a sample of distributions 
{PijiLii which themselves have to be encoded, for each j, by a finite set of cavity fields {hij}^^. This drastically 
limits the sizes Af and A/"', and hence the precision of the numerical results. Moreover generating one element, say 
Qi, from k — 1 P^'s is by itself a non trivial task. The various fields representing Qi are weighted in a non uniform 
way because of the factor in Eq. (pij) . which forces the use of delicate resampling procedures. 

These equations can be greatly simplified analytically for two particular values of to, namely and 1. For the 
sake of readability we postpone the discussion of these important simplifications until Section |Vl and proceed in the 
next section with the presentation and the interpretation of the results obtained either at arbitrary to with the full 
numerical procedure (whose implementation details are exposed in Appendix [X| or in to = 0, 1 with the simplified, 
more precise ones. 



IV. TRANSITIONS IN THE SATISFIABLE REGIME OF RANDOM fc-SAT 
A. The dynamical, condensation and satisfiability transitions for k > 4 



Let us begin our discussion of the satisfiable regime of random /c-SAT by studying the case fc = 4, the values k > 4 
having the same qualitative behavior. On the other hand, the phenomenology of 3-SAT is different and we report on 
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I 

FIG. 2: The point-to-set correlation function for k — 4, from left to right a — 9.30, a — 9.33, a = 9.35 and a = 9.40 
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a 



FIG. 3: The complexity S and the internal entropy ^int for the values m = 0, 1, and m — nis in the IRSB regime, for k — 4. 
it in Sec. ITVDl 

Following the program of SeclUwe first have to determine the value ad for the appearance of a non-trivial solution 
of the IRSB equations with m = 1. To this aim we compute the point-to-set correlation function Ci, that is the 
average of the correlation function between a randomly chosen variable i and the set B of variables at distance 
i from it. The plots of Fig. [2] show that for a < ad ~ 9.38 this correlation vanishes at large distance, while for 
larger values of a a strictly positive long range correlation sets in discontinuously. To distinguish between the dlRSB 
and IRSB regime we then compute the complexity Y,{m — 1). As demonstrated in Fig. [3] this is strictly positive at 
ad, then decreases continuously until it vanishes at ac ~ 9.547. Finally the satisfiability transition as is found from 
the criterion of vanishing of S(m = 0), i.e. the maximum of the entropic complexity curve (see Fig. [21): the value 
as ~ 9.931 is in agreement with [l3] and we shall show in Sec. IV Bl that this is indeed the same calculation. 

To summarize, we find the three regimes RS, dlRSB, IRSB described in Sec. Ill B] occurring in this order, for the 
values of a in [0,ad], [ad,ac] and [ac,as]. We expect this pattern of transitions to be the same for all k > 4. This 
is supported by our numerical investigations for A; = 4, 5, 6 (see Tab. Ufor a summary of the numerical values of the 
thresholds), and by the large-A: expansions presented in Sec. IVII 

The entropy density (see Fig.[21) is given by the RS formula both in the RS and dlRSB regimes. In the latter case 
it has to be understood as the sum of the complexity I](m — 1) and of the internal entropy of the associated states, 
4'inti'm- — !)• On the contrary for a € [ac, as] it is necessary to compute the whole function S(0) by varying m. The 
entropy density coincides with the one of dominant clusters, and is given by the point where S((/') vanishes. 
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k 


Old 


etc 


a,\l2] 


at 


3 


3.86 


3.86 


4.267 




4 


9.38 


9.547 


9.931 


9.88 


5 


19.16 


20.80 


21.117 




6 


36.53 


43.08 


43.37 


39.87 [46] 



TABLE I: Numerical values of the various critical thresholds. For fc = 3 we have formally Oc = Qd, see the text for details on 
the nature of the difference between = 3 and k > A. 
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FIG. 4: The complexity for k = 4 and several values of a: from top to bottom a = 9.3, 9.45, 9.6, 9.7, 9.8 and 9.9. 



B. The entropic complexity curves 



The curves S((/>) are shown m Fig. |3]for several values of a. The symbols are obtained in a parametric way, by 
solving the IRSB equations for various values of m and plotting the point ((/)int('7i), S(m)). The lines in Fig. |4] are 
numerical interpolations, obtained by fitting not directly S (</>), but instead the data for $(m) with a generic smooth 
function^ and then analytically deriving the fitting function to obtain the curves in Fig. HI The agreement of this 
fitting procedure with the parametric plot is excellent. The three regimes are clearly illustrated on this figure: 

• For a < Q!d a portion of the curve can exist (for instance there is a solution of the IRSB equation with 
171 = for a > 8.297 [T5|), yet it has no point of slope —m = — 1. The contribution of these clusters is negligible 
compared to the dominant RS cluster. 

• For a E [a^, ad (see e.g. a = 9.45 data in Fig. U]) the complexity E(m = 1) exists and is positive (it is marked 
by a black circle in the figure). 

• For a G [uc, as] (see e.g. a = 9.6, 9.7, 9.8, 9.9 in Fig. H]) the complexity E(to = 1) is negative and thus the S('/') 
curve vanishes at 4>{ms) (marked with a black square), where the slope (in absolute value) is smaller than 1 and 
equals ms{a). The measure is dominated by a subexponential number of clusters of entropy (j){ms), shown as a 
f miction of a in Fig. [31 



^ We have tried different fitting functions and all provide equivalent and very good results thanks to the smoothness of ^(m). 
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FIG. 5: The value of the Parisi parameter iris in the thermodynamicaUy relevant pure states of the IRSB regime in random 
4-SAT, and the freezing transition mf. 

The value thus estimated of the Parisi parameter ms{a) in the IRSB regime is plotted in Fig.O (it is identical to 1 
in the dlRSB region). The curve close to the mg data is not a fit, but instead an explicit approximate expression for 
ms(a) which becomes exact in the large k limit (see Sec. IVII for details). Indeed Eq. ([5T|) (valid to leading order at 
large k) can be equivalently rewritten as 



and this gives an expression for ms{a) once values of etc and as determined numerically for fc = 4 are plugged into 
Eq. (pOj) . Note that the solution to Eq. pp]) is such that (i) ms(Q;c) = 1, (m) ms{as) — and (iii) nis vanishes as a 
square root at as- The finite k corrections to the expression PU)) seem already small for fc = 4, as can be inferred from 
the good agreement with the numerical data displayed in Fig. \5\ This fact was also noticed for the coloring problem 
in 

Once we compute the optimal value mg for each value of a, we can plot in Fig. [S] the overlap go and qi of the 
dominating clusters as a function of a. Notice that the inter-state overlap go is an increasing function of a for any 
fixed value of m, but becomes a decreasing function of a between ac and as where we take m = ms(a). 

We did not attempt a complete determination of the portion of the plane (a, m) where non-trivial solutions of 
the IRSB equations can be found. From our numerical investigations it seems that solutions with smaller values of 
m appear at smaller values of a, i.e. the threshold ad('Ti) is an increasing function in the range of parameters we 
considered. In particular, solutions with negative m appear at rather small values of a. The limit of very large negative 
values of m is however difficult to study numerically, and more work could be done on this issue; the corresponding 
pure states are tiny because their variables are overconstrained, which plagues the numerical resolution of the IRSB 
equations. 



Another characterization of the clusters of solutions, besides their internal entropy and self-overlap, is the presence 
or not of frozen variables, that is variables that take the same value in all the solutions of the cluster. In technical 
terms this corresponds to a non-vanishing weight on ±oo in the IRSB cavity field distributions P{h) (see Eq. ([55)1 
below). Our data show that, given a value of a, there exists a threshold mt(a) such that clusters described by m < nif 
do contain frozen variables, while those with m > mf do not. This is consistent with the intuition: the freezing of 
variables is correlated with a smaller value of the internal entropy, hence of m. Numerical estimates for the line mf^a) 



as- a _ 1 - 2™(1 - TOlog2) 
as - ac 2 log 2 - 1 



(40) 



C. On the presence of frozen variables in clusters of solutions 
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FIG. 6: Intra and inter-state overlaps for k = 4. 



are plotted in Fig. O for k = A. The large error bars are due to the fact that we have checked the presence of frozen 
variables only at m values which are multiples of 0.1 (and no interpolation can be done in between, since the property 
is just true or false). The interpolating curve is a fit to the mf{a) data with the function A{x — 8.297)^ (for m = 
the critical value of a is ad^m = 0) ~ 8.297 [12]). The freezing transition at is defined by the appearance of frozen 
variables in dominating clusters, that is ms(af) = TOf(Q!f). From the crossing of these two lines in Fig. Owe estimated 
the freezing threshold for A; = 4 at af « 9.88. 

The fact that the freezing transition occurs after the condensation one for A; = 4 is not generic; for fc > 6 the 
threshold mf{a) reaches 1 at at < cxc [4^ . hence in a part of the dlRSB regime the dominating clusters do contain 
frozen variables for these values of k. 

Let us however emphasize that generally ad < Q^f, i-e. that in random fc-satisfiability (and also in q-coloring [l^) 
clustering can occur without implying the freezing of variables. This fact has been obscured up to now because the 
energetic cavity method [13) US] focused precisely on the fraction of frozen variables in the m — solution of the IRSB 
equations, and because in the simpler XORSAT model [ss", '39^ the freezing and clustering transitions coincide. We 
refer the reader to for a more extensive study of the freezing transition, in particular its interpretation in terms 
of the divergence of the minimal rearrangements [47l | it induces, and to jSy] where it has been proven that frozen 
variables exist in every cluster for fc > 9 and a large enough. 



D. fc = 3, a special case 

We turn now to the description of our numerical results in the particular case fc = 3, recently investigated also in p^ . 
The onset of long-range point to set correlations, displayed in Fig.[7|through the correlation function C^, is qualitatively 
different from fc = 4 (compare with Fig. [2]). The long range correlation lim^^oo Ce grows indeed continuously from 
at ad (in qualitative agreement with the variational approximation of Q). In fact this transition coincides with 
a local instability of the RS solution with respect to IRSB perturbations (this is a generic fact for all models with 
continuous dynamic transitions). A numerical procedure can be used to locate precisely this instability (isl. ligL [50l|. 
We get the estimate ad = astab ~ 3.86. Please note that for /c > 4 this local instability occurs after the discontinuous 
transition, for instance at astab ~ 10.2 for k — A. 

For a > ad the complexity I](m = 1) decreases continuously from (see lowest curve in Fig. [S]): there is no dlRSB 
regime for 3-SAT. We then turned to the resolution of the IRSB equations for other values of to. In Fig. [S] we plotted 
the complexity as a function of a, for various values of to. According to the interpretation of the IRSB regime of 
Sec. Ill B[ for each value of a we can find the Parisi parameter TOs such that S = 0, and obtain the IRSB estimate of 
the entropy as the internal entropy of these states. We plot this quantity in Fig. [51 together with the replica symmetric 
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FIG. 7: The point-to-set correlation function for A; = 3, from left to right a — 3.60, a = 3.84, a — 3.86, a = 3.88. 
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FIG. 8: The complexity E for = 3 and m from (highest curve) to 1 (lowest curve). For < m < 1 the domain of existence 
of E may be slightly larger than the one shown in the plot (we have simulated only a values multiples of 0.05). 

(RS) estimate and the value obtained from the m — solution. 

We also present in Fig. [TO] the entropic complexity curves for a few values of a. Note that these curves can seem 
incomplete; in fact for some values of (a, m) we found only inconsistent solutions of the IRSB equations, as is explained 
in more details in AppendixJCl This might be related to an instability of the IRSB solution toward higher levels of 
replica symmetry breaking jSOl . Isil . [s^ . 

V. SIMPLIFICATIONS OF THE IRSB EQUATIONS 

The numerical analysis of the IRSB equations (|3ip . ((5^ is, in general, an extremely difficult task. Their analytical 
control is even more challenging. In this section we explain how the IRSB approach simplifies in the two cases m — 
and TO = 1, allowing for a precise numerical calculation of the complexity and internal entropy in these points. 

Because of the special role played by the value m — 1, see Section|lIl this enables to estimate precisely the dynamical 
and condensation thresholds a^^k) and ac{k). The simplifications arising at to = are on the other hand the reason 
of the efficiency of the SP algorithm [IQl . Here we will show how the states entropy can be computed at a small extra 
cost with respect to the approach of |10| . 

For the sake of concreteness, we discuss these simplification in the case of random /c-satisfiability. They have 
however a much wider domain of validity. The same derivations do indeed hold for general mean-field models on 
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FIG. 9: The IRSB estimate for the entropy of random 3-SAT, compared to the replica symmetric (RS) estimate and to the 
internal entropy of the m — solution, corresponding to the maximum of the S(<?!>) curve. 
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sparse random graphs. 



FIG. 10: The complexity in random 3-SAT, for several values of a. 



m — 1 and tree reconstruction 



There is a strong connection between the IRSB formahsm with Parisi parameter m = 1 and the tree reconstruction 
problem (or computation of point-to-set correlation), as discussed in [2g and outhned in Sec. Ill CI We follow here a 
somehow inverse perspective with respect to (26j : starting from the IRSB equations we shall progressively simplify 
them. At the end we shall comment on their interpretation in terms of the tree reconstruction problem. 

Let us first define the averaging functional h[P] (resp. u[Q]) which associates to the distribution P (resp. Q) of 
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cavity fields a single real through the relations 

tanh7i[P] = JdP{h)tanhh, tanhu[Q] ^ J dQ{u) tanhu . (41) 



Consider now the right hand side of Eq. ()32p for m — 1. The normalization factor can be expressed in terms of these 
averaged fields, 

Z3[Q+,...,Ql,Q^,...,Q^_] = zMQtl ■ ■ ■ MQt^MQlh ■ ■ ■ MQL]] ■ (42) 

Using this fact and denoting by G[Qf , . . . , Q^^, Qi , ■ ■ ■ , ] the right hand side of Eq. (|32p one can also show that 

h[G[Qt. . . . ,Q+ ,gr: ■ ■ • -QrJ] - E"[Q^] - E"[Qr] • (43) 

i=l i=l 

Treating similarly Eq. (|3ip . whose r.h.s. shall be denoted F[Pi, . . . , Pk-i], one obtains 

Z4[Pi, . . . ,Pfc-i] = Zi{h[Pil ■ ■ ■ MPk-i]) , u[F[Pu . . . ,Pfc-i]] = f{u[Pil ■ ■ ■ MPk-i]) ■ (44) 

h (resp. u) can be viewed as a random variable, induced by Eq. (|4ip with P (resp. Q) drawn from V(^i) (resp. Q(i))- 
The above remarks show that their distributions obey the RS self-consistency equation (|25p . Let us now define a 
conditional average of P(i), focusing on the P's in the support of P(i) with a prescribed value of h[P]: 

no) W J 

The conditional distribution Q{u\u) is defined analogously, with 'P(i)[P] replaced by Q(i)[(5]. 

Consider again the distributional equations (|3ip , ([5^ . Once the normalization factors have been expressed in terms 
of the average fields /i, u, the right-hand sides are multi-linear functions of the distributions P, Q. It is thus possible 
to take the conditional average as in Eq. ([45|) . This yields closed equations on P and Q: 



.k-l „fe-l 



Q{u\u)Q(Q){u) 



I n d^(0)(/i.) -^(^ - /(/ii, ■ • • , hk-i)) f n dP(/i.|/».) '5(1* - /(/ii, . . . , ;,^_^)) ^4(fti,...,fefc-i) 



pmv(om = E ^ — Tu\ — / nd2(o)(^^)nd2(o)(^r)q/^-E"^+E^ 



Z+,i_=0 " i=l 1=1 \ i=l 4=1 



i—1 i=l \ i—1 i—1 



These equations are definitely simpler than the original ones (I5T|) . ([5^ . In particular P {h\h)V (Q){h) can be viewed as 
a joint distribution of {h, h) and represented by a population of couples {{hi, hi)}^-^^. The presence of the reweighting 
factors still represents a difficulty that we shall now get rid of by a further simplification. Before proceeding, let us 
emphasize the identities 

J dP{h\h) tanh/i = tanh/i , J dQ{u\u) tanhu = tanhu , (47) 

which follow directly from the definition (j45p and which are indeed preserved by the equations We define now, 

for cr = ±1, 

1 + o" tanh — ,— , 
PAh\h) = ^ ^ ^ - Pih\h) . (48) 
1 + CT tanh n 

Using property (j47p . one can check that for any h and any a Pa{'\h) is well normalized, and that 

-^/, ,-;-s ■^-^ 1 + cr tanh /i — ,— , 

Pih\h) = J2 ^ PAh\h). (49) 



18 



Similar definitions and properties hold for Q^(u\u). Inserting these definitions in Eq. one obtains 

Q.("l")Q(o)(") = \{dV(a){h,)5{u- f{hi,...M-i)) 

i—l 

. fc-1 

^(cti, . . . ,(Tfc-i|cr, /ii, . . . / \YdP^^{hj_\hi) 5{u- f{hi,. . . ,hk^i)) , (50) 

CTi,...,CTfc_i i=l 

where the summation runs over the 2^^^ configurations of the Ising spins tri, . . . ,crfc_i with probabilities given by 

/i(cri, . . . ,f7fc_l| + ,7li, . . . ,7lfe_i) = J]^ ^- ^, (51) 



1=1 



/ I r X ^ (1 - I(cri = • •• = CTfc-i = -)) -r-i- 1 + CTitanh/ii 

The second of the equations in yields 



P.{h\h)V(o){h) = TlH / 11^2(0) (^+) 11^2(0) (^D 5 [h-Y^ut + Y. 

i+,i_=0 +■ ■ 1=1 i=l y i=l i=l 

/ ndg.«i^+)ndQ-^Kn^r) U-E< + E"r ■ 

i=i i=i y 1=1 1=1 y 



(53) 



The equations (|50p . (|53p are particularly convenient for numerical resolution. This can be obtained through an 
appropriate generalization of the population dynamics algorithm, that employs two population of triples {{hi,hf ,h~) : 
i — 1, . . . ,M} and {{uj, u'^' ,uj) : j — 1, . . . ,J\f}. In the actual implementation it is actually more convenient to store 

the hyperbolic tangent of these quantities, e.g. tanh/i^, tanhhf, etc. These populations are updated recursively 
according to the pseudocode below. 

Population Dynamics m = 1 (Size A/", Iterations imax) 



1: For alH e {1, . . . ,7V} 
2 

For all i e {1, . . .,tinax}: 



Set hf ~ ±oo and draw hi from 'P(o); 



For all j e {1, . . . ,M} generate a new triple 



Compute u+ = /(/.^;,...,/if^:/); 



5: Choose fc — 1 indices ii . . . ik-i uniformly in [A/]; 

6: Compute = /(/ill, J; 

7: Generate a configuration cti . . . crfe_i with the law /i(- • • |+, hi-^ . . . hi^^^-^) in Eq. ([ST 
8 
9 

10: sciuT ^f{hii,...x:::). 

End- For; 

For all i e {1, . . . ,N} generate a new triple {hi, hf , h~): 

Draw two independent Poisson random variables 1+ and /_ of mean ak/2] 
Draw + L iid indices if , . . . ,if , . . . uniformly random in [Af]; 



11 
12 
13 
14 
15 
16 



3 

Generate a second configuration of spins with the law ([5^ : 



Set /ij = Y^lUi - EL=i ' = Em=i "f+ - EL=1 "i" 



End-For; 



The justification of the initialization will be given below. After a moment of thought one can convince oneself that 
the above update rules are the correct discretization of Eqs. (ISH]) and ([55]) . More precisely, if the triples {hi, hf ,h~) are 
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iid and the two pairs {hi,h^), {hi,h~) have distributions (respectively) P+(/i+|ft-)7'(o)(/i) and P_(/i~|/i)7'(o)(/i), then 
the pairs {uj,u^), {uj,u~) resulting from the above update have distributions Q+{u'^\u)Q(o){u), (5_(u~|w)Q(o)(w)- 
An analogous statement holds for the update from the triples {uj,u'j' ,uj) to {hi, hf , h^y^ . 

Most relevant observables can be written as expectations with respect to the distributions P±{h±\h)'Pi^o){h), 
Q±(''^±l^)2(o)(^) ^-iid hence estimated from these population of triplets. 

Notice that, by definition, the IRSB potential computed at m = 1 is equal to the RS free-entropy, <i>(?7i = 1) = <j>{o)- 
The internal entropy can be expressed in terms oi P{h\h) and Q{u\u) by integrating over ■P(i), Q(i) in Eq. (pSj) . These 
conditional distributions can be further replaced by and thanks to Eq. (jH]), yielding finally 

0i„t(m = 1)= - akj dV^o) (/^)dQ(o) (^) E ^ ^ " 2^^" ^ / dPaih\h)dQ,iu\u)\ogz,{u,h) (54) 
]^d7'(o)(/ij) E ^{(Ti, ... ,ak\hi, ... ,hk) / JJdPcri (/ij|^i) log 22(^1, /jfc) 

i=l cri,...,(Tfc i=l 

^ e-"'=(afc/2)'++'- f^,^ . +^|T,^ . l + '^tanhfe^tiu+-^tiwr 

+ E — th^ — / 11^2(0) (^^) 11^2(0) (^r)E ^ — 2 

;+,i_.=o ■ 1=1 i=i o- 

/ n^Q<T("^i"i^)n'^'5-'T("4 )iogz3(u+,...,u+,ui . 

1=1 1=1 

In the second term the distribution of the configuration (di, . . . ,ak) reads 

k — 

/ IT T \ (1 - I(cri = ■ • ■ = CTfe = -)) TT 1 + CTitanh/ii 



i-n:=i 



1 i=i 



This expression of the internal frcc-cntropy is readily evaluated by sampling from the population of triplets defined 
above, the complexity of the m = 1 states is then finally expressed as I](m = 1) = <i>(m = 1) — (f'inti'm = 1). 

Consider now the definition of the overlaps given in Eq. ([55)1 . The inter-state one go is easily seen to be equal to 
the RS one. Moreover qi can be written as 

qi^ J <^'P{o)(h) J dP {h\h) t&nh^ h . (56) 

To rewrite qi in terms of the distribution P^r, note that tanh^ ft, = (tanh /i) (7(1 + cr tanh/i)/2 and use (|48|) to 
obtain 

qi= J dP(o)(/^)E^^^^^^ / dPAh\h)tanhh . (57) 

These expressions allow to estimate go, 9i from the population of triples {{hi,hf ,h~)}. 

In Figs. [Hand [7] we followed this approach to plot the difference qi{i) — qo for several values of a and fc = 3, 4, whereby 

the population {{hi, , )} is obtained after £ iterations of the above algorithm. For a < ad{k), qi{£) — go — > 0, 
while for a > a^ik) it is bounded away from 0. Let us emphasize the great simplification achieved: the equations 
(|50l53p are much simpler than the original IRSB equations: they can be solved using a simple population of triples, 
instead of a population of populations. Further, the initialization used in the pseudocode above is the correct one, in 
the following sense. If the equations (|50p. ([55|) admit a non-trivial solution, then their iteration converges to a non 
trivial solution under such an initialization. 



Notice that it would be wrong to claim that {hi, , ) is distributed according to P+(/i+ |/t)P__ |/i)7'(-0) C^) ■ tt^^ update rules used 
in the algorithm induce correlations between (for instance) the fields /i+ and h~ inside the same triplet. These correlations do not spoil 
our claim. 
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The last statement follows from the interpretation of the order parameters in terms of tree reconstruction. Consider 
an infinite tree fc-satisfiability formula roted at variable node i. The tree is random with distribution defined by letting 
each variable to be directly (resp. negated) in (resp. L) clauses, where l± are independent random Poisson random 
variables with mean ak/2. One can define a (uniform) free boundary Gibbs measure fi over SAT assignments of such 
a tree. Imagine now to generate a solution from this measure, conditional on the root value being a, and denote by 
CT^ the values of variables at distance at least £ from the root. Define the fields h, /i^ by 

, , l + (Titanh/i l + (Titanh/iJ 

m(o-») = ^ , ^^{'^i\oLB) = 2 ^ ' 

Notice that both are random quantities, h because of the tree randomness and both because of the tree and of 

the random configuration a^. Let P^(/i|ft,) be the conditional distribution of /i^ given h. 

It is not hard to show that P^(/i|/i) is the distribution obtained by iterating ([SU]) . ([55]) £ times with initial condition 

Pj-{h\h) = V(^Q){h)S{h ^ oo). This corresponds indeed to the initialization we used in the population dynamics 
algorithm. It follows from the arguments in [HI that this is the correct initialization, in the sense described above. 
Further, under the usual assumptions of the cavity method and for a < ac{k), the quantity qi{£) — qo plotted in 
Figs • [Hand [7] coincides with the correlation function in the large N limit. 



B. m = 0: Survey Propagation and the associated internal entropy 



We turn now to the second particular case for which a simplified treatment of the IRSB formalism is possible, 
namely at to = 0. 

To begin with, let us consider the structure of the distributions P{h) (resp. Q{u)) in the support of V^i) (resp. 
Q(i)) for an arbitrary value of m. A moment of thought reveals the possibility of "hard fields" h — ±oo that strictly 
constrains a variable to take the same value in all configurations of a cluster of solutions. We can take care explicitly 
of this possibility by denoting 



P{h) ^ x-5{h + oo) + x+5{h -oo) + {l-x- - x+)P{h) 



Q{u) = y 5{u ^ ^) + [1 ~ y)Q{u) 



(59) 



where P and Q have their support on finite values of the fields, that shall be called 'soft' or 'evanescent'. Rewriting 
the right hand side of (PT|) with these notations yields 



Q(-) 



Zi\Pi, 



fc-i 



k-1 



Si') 



+ E 11(1 " xt ~ x-)l[x- f l[dP,{h,){l + e-^'Y^S ( . + i log ( 1 - n 

/|>iie/ i^i iei \ \ ie/ 

where the summation on / is over the non empty subsets of {l,...,fc — 1}. 

To achieve the same task for Eq. ((5^ it is advisable to introduce some more compact notations. 



1 — tanh hi 



(60) 



^a = llil-y:), 5, = n(l+tanh<) , = J];(l-tanh<) 



(61) 



in terms of which we have for instance 2:3 = 5+T_ + T+5_. We shall also denote E[»] the average over the uf drawn 
from the Qf, and E similarly using Qf. We then obtain 



00) 

(62) 
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Analogously, the replicated free-entropy $(m) and its derivative can be rewritten by making explicit the distinction 
between hard and soft fields. 

Consider now the previous equations with m = 0. As we have explicitly removed all the contradictory terms which 
had a strictly vanishing reweighting factor in the original relations ()31|32|) . all the terms raised to the power m in 
Eqs. (|60l62p are strictly positive, hence these factors go to 1 when m vanishes. Two important consequences are to 
be underlined : the normalization factors Z3 and do not depend on the evanescent distributions P, Q. In fact 
Z3 = 7r-(_ + 7r_ — TT+TT- and Z4 = 1. Moreover the equations on the intensity of the hard fields peaks decouple from 
the evanescent part when m goes to 0, ()60|62p yielding for them 



d 

y = 



fc-i 

n 

1=1 



{x+,x ) 



-^ d 



(1 - 7r+)7r_ 



(1 - 7r_)7r+ 



(63) 



which are nothing but the probabilistic form of the Survey Propagation equations [l2l | . For future use we denote Qsp {y) 
and Psp(a;+, X-) the distributions of these random variables. The complexity at m = is I](to = 0) = $(m = 0) and 
can then be expressed from Eq. (p4|) as 



S(m = 0) = $(m = 0) = -afcE[log(l - x~y)] + aE 



log 



+ E[log(7r+ + 7r_ - 7r+7r_)] 



(64) 



where the average is done with respect to Psp and Qsp- 

By focusing on the intensity of the hard fields this 'energetic' version of the cavity method [l3,[l3l lost the information 
contained in the evanescent field distributions P, Q, which is necessary to obtain the internal entropy of the states, 
$'(m = 0). This quantity can however be obtained in a rather simple way. We shall indeed define Q{u\y) as the 
average of the evanescent part of Q drawn from Q(i), conditioned on the value of the hard field delta peak, and 

similarly P{h\x'^ , x^). As the right hand sides of ()60l62p are linear functionals of these evanescent distributions when 
m = 0, closed equations on this conditional averages can be obtained. We shall write them in terms of the joint 
distributions Q(u, y) — Q{u\y)Qsp{y) and P{h, x+, x~) — P{h\x^ , x^)Psp{x^ , x^), 



Qiu,y) 



i=l 



fe-1 



Y]^ dhidx^dx^ P{hi,xl, ) 6 iy - Y]_ 



i-nt"/(i-^+) 



S{u) 




, (65) 



P{h,x+,x-)^ J2 nl^uidy+Q{ut,yr)Yldu-dyrQ{u-,yr) 

S(x^- ] s(x-- (1 ] s(h-y:ut + y:ur] . (66) 

A solution of these equations can be obtained through a simple population dynamics algorithm, encoding Q{u, y) as 
a population of couples {{ui, yi)}{d and P{h, 2:+, x~) as {{hi, xf , x~)}-^i. The update rules of the algorithm can be 
deduced from ()65|66p : a new element {h, x^,x^) is obtained drawing two Poisson random variables l± of mean ak/2, 
+ L elements of the population {(u.;, j/^)} and combining them according to (j66p . The translation of (|65|) is only 
slightly more complicated. After extracting k — 1 elements at random from the population {{hi, x^ , x~)} one obtains 
y as the product of the fc— 1 elements x_. One then draws a configuration (si, . . . , Sk-i) G { — 1, 0, +1}'^"^, each 'spin' 
Si being ±1 with probability xf and with probability 1 — xf — x~ , conditional on (si, . . . , Sk-i) 7^ (—I? • • • , ~1)- If 
at least one of the Si is equal to +1 the new value of u is taken to 0, otherwise u — — log(l — ]^(1 — tanh/ii)/2)/2, 
the product being taken on the indices i such that ai = 0. 
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The internal entropy of the m — pure states can be obtained from the solution of these equations, simplifying 
Eq. dSSl) into 



0int [m = 0) 



akE 
akE 

aE 



x^y log 2 + {1 — y){x log(l — tanh u) + a;+ log(l + tanh u)) 



l-x y 

(1 — a;+ — x^){y\og{l + tanh/i) + (1 — y) log(l + tanh ft, tanh tt)) 



1 - a; y 
fe ( p 

-) n a;,-iog i-n 



1 — tanh hj 



(67) 
(68) 

(69) 



p— 1 ^ i—\ 

+ E [7r+7r_ log(5+T_ + T+^_) + 7r_(l - 7r+) log(5+T_) + 7r+(l - 7r_) log(r+5_)] , (70) 

where the expectation is over independent copies of elements drawn from P{h, x^ , x^) and Q{u,y), and in the last 
line (where we used the shorthand notations defined in (j6ip ) over the Poissonian random variables l±. This quantity 
was plotted for fc = 4 in Fig. [31 

Let us emphasize the great numerical simplification with respect to the general IRSB equations: we have to deal 
here with populations of couples (or triplet) of fields, not populations of populations. Yet we manage to extract not 
only the complexity, which was the one computed in the probabilistic version of survey propagation, but also the 
associated internal entropy. 

VI. LARGE k RESULTS 

To complement the numerical resolution of the IRSB equations, we present in this Section analytic expansions of 
the various thresholds and thermodynamic quantities for large k. Some technical details of these computations are 
deferred to Appendix [Bl 



A. Dynamical transition regime 



A non-trivial solution of the IRSB equations appears in the regime defined by 



log k + log log fc + 7 + O 



log log k 

log fc 



(71) 



with 7 finite as fc — > oo. In this regime the IRSB distributional order parameters 'P(i), Q{i) are supported on cavity 

field distributions of the form ([55)1 with P{-), Q{-) supported on finite fields. The weights of the hard fields are 
deterministic to leading order, with 



^ 2 2fclogfc 



O 



1 



fc(logfc)^ 



1 - 



logfc 



O 



1 



(logfc)2 



(72) 



A set of coupled equations can also be written for the averages of P, Q, in terms of which one computes a function 
A(S, m) that finally determines ^(7, to) as a function of 7 by solving the following equation: 



7 = ^ + log 4 + A((5, to) , 
Id 



(73) 



Both the expressions for A{S, to) and the equations for the averages of P, Q are quite involved and we report them in 
Appendix [B] In any case the right hand side of Eq. ([75[) diverges for 5 ^ and 6 ^ 00. As a consequence a pair of 
solutions^^ appears for 7 > 7d(TO), where 7d(TO) is obtained by minimizing the above expression over S. For m = 0, 1 



Consistency arguments imply that the one with smaller S must be selected. 
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the formulae simplify yielding K{5,m = 0) = and K{5,m = 1) = log 2 independently of 5, whence the minimum 
takes place at (5 = 1 for these two values of m. 

To summarize this yields the following estimate for the dynamical threshold 



ad(fc,m) = — 



log k + log log fc + 7d {m) + O 



log log k 
log k 



(74) 



with 7d(TO = 1) ~ 1 and 7d(m = 0) = 1 — log 2. Notice that the transition at m = 1 occurs slightly after the one at 
TO = in agreement with what is found numerically for small values of /c > 4. 



B. Intermediate regime 

Consider now the limit k ^ oo with a — 2^ a for some fixed S > 0. On this scale the SAT/UNSAT phase transition 
occurs at Ss = log2 + 0(2"*^) We shall therefore assume a G (0,log2). In this regime it is convenient to use 

again the decomposition ([5^ . with at leading order P{h) = 6{h) and = (1/2)(1 — ir^e""''). From this Ansatz one 
finds that = 2™^^, then it follows that the IRSB potential is asymptotically 

$(m)=log2-S + e-"''X2"'~^-l) + 0(2-''') . (75) 

By derivation of this expression one obtains the internal entropy, 

0int(m) = e-"'=2"-ilog2 + 0(2-'=) , (76) 

and defining a reduced quantity a by (j)int — e""*''(log 2)cr, we get the complexity function explicitly, 

E(cr) = log2-S + e-"'=S((T) + 0(2-*^) , S((t) = cr(l - log2) - crlogcr - 1 . (77) 

Notice that, for large k, the internal entropy of states is exponentially smaller (in fc) than the complexity. Further, to 
leading order, the complexity vanishes at a = log 2, independently on m. 



C. Condensation regime 

In order to resolve the separation between the condensation and satisfiability phase transitions we must let fc — > oo 
with a ~ 2^ log 2. More precisely, we define a = 2^^ log 2 — C, and take fc ^ oo with C, fixed. Again, we use the Ansatz 
([55)1 with, at leading order P{h) = S{h) and x± = (1/2)(1 - x±2-'=). 

We then get the expansion of the potential, 

1>M - ^ {C - Cs + (2" - l)/2} + 0{2-''') , (78) 

with Cs = 5(1 + log2). The entropy can be determined by deriving the above with respect to to; defining the reduced 
entropy density through (pint = 2^'^(log 2)(t, the complexity reads in this regime 

S(a) = 1 |c - Cs + fT(l - log 2) -aloga - i| + 0(2-^'=) . (79) 

The condensation and satisfiability transition are located by determining ^ such that E(to) — for (respectively) 
TO = 1 and TO = 0. We get 

ac(fc) =2'=log2-^^+0(2-'=) , a,(fc) = 2Mog2-i±^ + 0(2-'=) . (80) 

The thermodynamic value TOs(^) of the Parisi parameter between these two thresholds is obtained by minimizing 
<I>(to)/to. At the order of the expression of <I>(to) given above TOs(C) is solution of 

C-Cs = 2"-i(2-"-l + TOlog2). (81) 
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fl.\ 0.65 




FIG. 11: Condensation threshold in reduced units, 2 *ac(fc)- Symbols: numerical determination by population dynamics 
algorithm, see Tab. HI Lines: analytical large k expansion, truncated at the three first orders, see Eq. (|83p . 



In particular one finds close to the satisfiability transition 



log 2 



(82) 



A systematic expansion in powers of 2"*^ of the satisfiability threshold as{k) has been performed up to seventh 
order in The corresponding expansion for the condensation threshold ac(fc) is slightly more difficult, because of 
the necessary control of the corrections to the evanescent field distributions. We thus contented ourselves with the 
computation of the next order in the expansion, 



ac(fc) = 2Mog2 



3 log 2 



6(log2)(log3)-7(log2)2^2 



(83) 



5(log2) 



3(log2)(log3) 5 log 2 



4 ■ ' 2 

This expression is compared in Fig. 1111 with the numerical results for small k. 
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VII. CONCLUSION 



The set of solutions of random /c-satisfiability formulae exhibits a surprisingly rich structure, that has been explored 
in a series of statistical mechanics studies 0, [l^ . Either implicitly or explicitly, these studies are based on defining 
a probability distribution over the solutions, and then analyzing its properties. While the most natural choice is the 
uniform measure, the authors of Ref. lOj achieved a great simplification (and a wealth of exact results) by implicitly 
weighting each solution inversely to the size of the 'cluster' it belongs to. Since clusters sizes are exponential in the 
number of variables, and have large deviations, this amounts to focusing on an exponentially small subset of solutions. 

In this paper we resumed the (technically more challenging) task of studying the uniform measure and obtained 
the first complete phase diagram (including replica symmetry breaking) in this setting. While we confirmed several 
of the predictions in [lol | , our analysis unveiled a number of new phenomena: 

1. There exists a critical value a^ik) of the clause density that can be characterized in several equivalent ways: (z) 
Divergence of auto-correlation time under Glauber dynamics; (ii) Divergence of point-to-set correlation length; 
{Hi) Appearance of bottlenecks between 'sizable' subsets of solutions. The value of a^ik) is bigger than the 
value obtained with the method of flQ( (except for fc = 3 where it is smaller). 

2. While ad(fc) does not correspond to an actual thermodynamic phase transition, such a phase transition takes 
place at a second threshold ac{k) < as{k) (as(fc) being the satisfiability threshold). This manifests in two-point 
correlations, as well as in the overlap distribution. 

3. The phase diagram is qualitatively different for fc > 4 and fc = 3. The latter value has been most commonly 
used in numerical simulations. This difference had not been recognized before because it does not show up in 
the behavior of the maximal complexity S(m = 0) investigated up to now. 
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A number of research directions are suggested by this refined understanding: 

(a) We kept ourselves to IRSB: it would be extremely interesting to investigate whether more complex hierarchical 
(FRSB) structures can arise in the set of solutions. A first step in this direction would be to analyze the 
stability [HO, IMIjUSI of the IRSB Ansatz, in particular to clarify our numerical findings for fc = 3. For fc > 4 we 
believe that our determination of ad and etc is not affected by FRSB, yet it might be that the pure states, for 
some values of their internal entropy, are to be described by a FRSB structure. 

(6) The dynamical threshold ad(fc) is expected to affect algorithms that satisfy detailed balance with respect to the 
uniform measure over solutions (or its positive temperature version). Let us stress that it is likely not to have 
any relation with more general local search algorithms (ssl . Is^ . Issl ] . It is an open problem to generalize the static 
computations performed here to obtain meaningful predictions in those cases. 

(c) Finally, the discovery of the condensation phase transition at ac(fc) suggests that belief propagation might 
be effective in computing marginals up to this threshold, as the average of the IRSB equations with m = 1 
corresponds to BP. The possible use of this information in constructing solutions is discussed in [s^ [s^, [H, . 
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APPENDIX A: ON THE NUMERICAL RESOLUTION OF THE IRSB CAVITY EQUATIONS 



In this section we discuss some issues related to the numerical resolution of Eqs. (l3T|) . ([32|) . As already mentioned, 
the IRSB order parameter 7^(i)[i-'] is approximated by a sample of Af populations, each composed of A/"' elements 
hi J, i G [JV], j G [A/"']. The numerical results presented in this work have been obtained with Af = 10'' and Af' — 10^. 

The solution to the IRSB cavity equations is found by an iterative procedure: starting from a "good" initial 
guess for the fixed point solution, we iterate a sampled version of Eqs. (f?T|) . ([5^ . After some iterations the sample 
of populations converges to a stationary state with fluctuations of order 0{l/^/M,l/^/M'). Convergence to the 
stationary regime is usually fast and may take around 10^ iterations in the worst cases we encountered. Once in the 
stationary regime, we keep iterating for at least 10^ steps. Meanwhile we take averages (over the populations and 
over the time evolution) of the quantities of interest. This considerably reduces statistical errors. 

Our actual numerical implementation makes use of two transformations with respect to Eqs. (|3ip. p2p . First, we 
make a change of variables into 



-2u 



1 + tanh(/i) 



(Al) 



both taking values in [0, 1] (note that the variable u is defined non-negative, see the definition of the function 
/(/ii, . . . , /ife_i) in Eq. ([23|) ). Moreover we exploit the fact that the reweighting term Z4{hi, . . . , hk-i) in Eq. ((3T|) 
is a function of u = /(ft-i, . . . , ft-fc-i) (cf. Eq. ([33])). This allows to transfer all the effects of reweighting to the other 
equation. Denoting Q(<p) and -P('0) the new distributions, these two transformations lead to 



» fe-i 



P(.) 



.-i+n(i-v. 



z 



(A2) 
(A3) 



where Z in the last equation is obtained by normalization. 

One delicate issue in solving this kind of equation is how to represent faithfully the left hand side of Eq. (|A3p 

by a sample of N' representative elements of P, because of the reweighting term (Hi V'i'^ + Wi vr)'"- ^ possible 
solution [23 consists in first generating a larger number, say SA/"', of outgoing fields, store them along with the 
associated weights, and then perform a resampling step to extract Af' elements from this intermediate population. 
This approach has the advantage of having complexity independent of the distributions Qi . Unhappily if the weights 



26 



are strongly concentrated on a small subset of the 5Af' fields, the resampled population will have many copies of these 
elements. This leads to a deterioration of the sample. 

We adopted a different strategy whose running time depends on how strong is the reweighting. For m > 0, we 
generate fields sequentially, and include them in the new population with probability proportional to the reweighting 
factor (divided by the normalization factor 2™). This procedure becomes slower when m grows, but it ensures that 
no repetitions appear in the new sample. 

Solving the equations for m < is instead much easier and no particular care is needed. For the sake of simplicity 
we have used the same algorithm as for m > (which now produces many repetitions in the populations) and we 
have simply checked the validity of our results by changing the number and size of populations. 

As explained in Sec. IV Bl the cavity field distributions can have a positive weight on "hard" fields, i.e. on fields 
that constrain a variable to take either value +1 or —1 in all solutions of the cluster. This corresponds to ip = 0, or 
ip € {0, 1}. This would show up into a positive fraction of the sample taking value, say, ip — 0, thus leading to an 
inefficient representation. 

In order to circumvent this problem, we kept track explicitly of the weights on ip — and ip £ {0, 1}, in analogy 
with Eqs. ([Sn| . ([S^ . This also allows to locate more precisely the appearance of a positive fraction of hard fields in 
the distributions, as discussed in Sec. lIVCl There is unfortunately one drawback to this approach. Consider Eq. (|A3|) 
and suppose that all the distributions Q of the right hand sides are supported on pf G (0, 1]. By definition the fields 'ip 
thus generated are also strictly positive. However the degrees l± are of order ak (i.e. around 40 for 4-SAT in the IRSB 
regime). As a consequence, it may happen that the product of the I- fields p~ is smaller than the smallest number 
in the computer representation used (using 64 bits and the denormalized floating point notation this limit is roughly 
"^min ~ 5 • 10"'^^''). How should one treat such cases? We have adopted the solution of ignoring, that is not including 
it in the population, any number below V'min- This solution is equivalent to saying that we are describing with a finite 
population of numbers the distribution Pitjj) not on the domain ip G (0, 1], but on the domain ■0 G (V'min, !]■ 

A different solution could be to convert all the numbers smaller than V'min to zero. We have tried this procedure, 
but it seems to be unstable, and to introduce systematic errors. In particular one obtains a positive weight for ip = 0, 
even for values of the parameters for which this is inconsistent. 

The last point we would like to discuss is the problem of how to initialize the population dynamics algorithm. It 
is clear that an iterative procedure does in general lead to different solutions depending on the starting point of the 
iterations. For instance the RS solution, where the distributions P{h) in V(^i) are concentrated on a single value of / t, is 
always a fixed point of the IRSB equations. In the case m = 1, the interpretation in terms of tree reconstruction [26[ 
leads to a clear prescription for this initialization, as explained in more details in Sec. IV Al One can follow the same 
procedure for other values of m, namely initialize the populations with essentially only hard fields. This is crucial in 
particular for = 3, where softer initial conditions lead to an unphysical fixed point, cf. App. [Cl 



APPENDIX B: LARGE k ANALYSIS: SOME TECHNICAL DETAILS 

In this appendix we provide the complete formulae for the dynamical transition regime of Section fVlAI To leading 
order one can write a set of coupled equations for the average of P( • ), Q{-) over the IRSB order parameters 7^(i), 
Q{i)- With a slight abuse of notation we shall keep denoting by P, Q such averages. In terms of this quantities we 
have 

A(^,™) = -log{/dP(/.)(i±|^)"} , (Bl) 

where the dependence on 5 is through P. Notice that A{S, to = 0) = independently of P. For to = 1 one can use 
the fact that by symmetry / dP{h) (tanh h) — 0, to deduce A{S, to = 1) = log 2. 
For TO 7^ 0, 1 one has to determine the distributions P and Q. It ttirns out that 



27 




FIG. 12: Intra and inter-states overlap, qo and qi, for fc = 3 and some values of the Parisi parameter m. Data below (resp. 
above) the RS line are for qo (resp. qi). Full (resp. open) symbols refer to data measured while increasing (resp. decreasing) a. 



The distributions P, Q' are solutions of the coupled equations 



1 ~ /•'- ~ i- 

P{h) - -E,^ W'^Q'Wt) mAQ\u-)zM.---,nt^.n^,...,uYr 5\h-Y,ut 

^ A — 1 — 1 — 1 A — 1 



(B3) 
(B4) 



where in the second equation Z is a normalizing factor and l± are two independent Poisson random variables of mean 
w/2. 

APPENDIX C: NON-UNIQUENESS OF SOLUTIONS OF THE IRSB EQUATIONS FOR fc = 3 



This appendix provides further details on the numerical solution of the IRSB equations for A; = 3. A difficulty that 
arises in this case is the presence, for some values of a and m, of at least two distinct non-trivial solutions of the 
IRSB equations (this has been already noticed in [l^ for a = 4.2, and in [l^ for the related coloring problem). As 
a consequence the initial conditions of the iterative resolution play an important role in selecting the fixed point that 
shall be reached. 

One can justify the existence of multiple solutions as follows. As mentioned in the main text, the continuous dynam- 
ical transition at ad ~ 3.86 corresponds to a local instability of the RS solution with respect to IRSB perturbations. It 
is important to underline that this instability condition is independent on the value of m, that is at ad a new solution 
of the IRSB equations should grow continuously away from the RS one, for all values of m. This is illustrated in 
Fig. 1121 where the overlaps go a-nd qi meet at ad for various values of m. By continuity these solutions do not contain 
hard fields in the neighborhood of ad. On the contrary it is known since 10] that another solution of the m = 
equations, with a finite weight on hard fields, arises discontinuously at a w 3.92. For larger values of a these two 
solutions thus coexist^^. A natural conjecture is that two solutions also coexist for m ^ 0. The iterative population 



Let us signal a peculiarity of the m = 'soft' solution. It is easy to realize from Eqs. II60I62I I that the average of the distributions P{h) 
and Q{u) in this solution verify the RS equations. In consequence its intra-overlap qi coincides with the RS overlap, its complexity 
vanishes and its internal entropy equals the RS one. 
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FIG. 13: The internal entropy should be a non-decreasing lunction of m if the solution is consistent. Filled (resp. empty) 
symbols refer to solutions with 9m0 > (resp. dm<i> < 0), for fc = 3. 
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FIG. 14: The entropic complexity for = 3 and a = 4.2. 

line) and inconsistent solution (dashed line). 



The two different branches correspond to the consistent (full 



dynamics algorithm converges to one of them depending on the initiahzation (more precisely, on the fraction of hard 
fields in the initial populations). 

Our data suggest that the interval of a in which the two solution coexist shrinks when m grows from 0. For instance 
in Fig. [12] one clearly see two branches for m = 0.2 at high enough values of a, whereas for m = 0.6 the two curves 
obtained by increasing and decreasing a at fixed m are superimposed within numerical precision. 

It remains to understand which, if any, of these solutions is the correct one. In principle one should test their 
stability with respect to higher level of replica symmetry breaking jsol . IsiL Is^ ] , however it is an extremely demanding 
numerical task that we did not undertake. A simpler consistency argument can be invoked by computing the internal 
entropy of the pure states. This should be an increasing function of to. We can see on the curves of Fig. [T3] that this 
condition is not respected for all the values of a and m (full symbols refer to consistent solutions, while open symbol 
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are for inconsistent ones). For values of a smaller than roughly 4.15 we are not able to find a consistent solution 
in the whole range of m S [0, 1] (a consistent solution exists only for m large enough). While for a roughly larger 
than 4.15 two solutions coexist at small values of m and the consistent one is the one with more hard fields. We 
also notice that this inconsistency is accompanied by the decreasing of the inter-overlap go with a: in other words we 
empirically find that the quantities dra4> ^'^^ daQo always have the same sign. This observation makes easier to locate 
in Fig. 1121 consistent solutions (those with qo increasing with a). In order to make connection with previous studies 
where consistent and inconsistent solutions were found [§, [isL [SJ] we plot in Fig. [HI the entropic complexity curve for 
a = 4.2: the full (resp. dashed) curve corresponds to the consistent (resp. inconsistent) branch. 
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